Skip to content

feat: Add DuckDB Parquet tutorial notebook#116

Merged
Haleshot merged 3 commits intomarimo-team:mainfrom
thliang01:main
Jul 11, 2025
Merged

feat: Add DuckDB Parquet tutorial notebook#116
Haleshot merged 3 commits intomarimo-team:mainfrom
thliang01:main

Conversation

@thliang01
Copy link
Contributor

@thliang01 thliang01 commented Jul 8, 2025

Create interactive marimo notebook demonstrating Parquet file analysis with DuckDB. Features remote file querying, table creation, and Airbnb stock price visualization using Plotly.

  • Direct FROM clause queries on remote Parquet files
  • read_parquet() function for optimized column selection
  • SQL-based time series analysis with reactive cells

📝 Summary

📋 Checklist

  • I have included package dependencies in the notebook file using --sandbox
  • If adding a course, include a README.md
  • Keep language direct and simple

Create interactive marimo notebook demonstrating Parquet file analysis
with DuckDB. Features remote file querying, table creation, and
Airbnb stock price visualization using Plotly.

- Direct FROM clause queries on remote Parquet files
- read_parquet() function for optimized column selection
- SQL-based time series analysis with reactive cells
@thliang01
Copy link
Contributor Author

Hi @akshayka @Haleshot,

I've implemented the DuckDB Parquet tutorial notebook as discussed in #48. The notebook demonstrates:

  • Loading remote Parquet files directly with DuckDB
  • Using read_parquet() for efficient column selection
  • Creating persistent tables from Parquet sources
  • Visualizing time series data with Plotly

The implementation uses the Airbnb stock dataset from Hugging Face as suggested in the issue. All dependencies are properly declared using marimo's sandbox format, and the content focuses on practical, beginner-friendly examples.

Would appreciate your review when you have a chance. Happy to make any adjustments based on your feedback.

Thanks!

@Haleshot
Copy link
Contributor

@thliang01 Thanks a lot for the PR!

Would appreciate your review when you have a chance. Happy to make any adjustments based on your feedback.

Will get to reviewing it shortly.

Copy link
Contributor

@Haleshot Haleshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid notebook & first notebook contrib covering DuckDB's Parquet capabilities 🎉; the progression from direct querying to read_parquet to persistent tables is great (the flow makes sense). The relevant Airbnb stock data analysis also helps with understanding (practical example).

Left some comments as part of the PR review; some minor nits/corrections.

- Add author attribution to notebook header
- Add sqlglot dependency for future SQL parsing capabilities
- Use consistent table references via variables instead of string literals
- Remove unused pyarrow import
- Improve markdown formatting for better readability

The notebook now properly references the created airbnb_stock table
through variables, making the code more maintainable and reducing
the risk of typos in table names.
@thliang01
Copy link
Contributor Author

@Haleshot - I've addressed the review comments and pushed a new commit. Main changes include adding author attribution and refactoring the SQL queries to use proper table variables. Ready for re-review when you have time!

@thliang01 thliang01 requested a review from Haleshot July 11, 2025 08:56
Copy link
Contributor

@Haleshot Haleshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great; thanks for addressing the comments so quickly and for the notebook contribution. LGTM 🚀

@Haleshot Haleshot merged commit 3cdd8a5 into marimo-team:main Jul 11, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants